168 research outputs found

    Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

    Full text link
    This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL

    Agricultural Robot for Intelligent Detection of Pyralidae Insects

    Get PDF
    The Pyralidae insects are one of the main pests in economic crops. However, the manual detection and identification of Pyralidae insects are labor intensive and inefficient, and subjective factors can influence recognition accuracy. To address these shortcomings, an insect monitoring robot and a new method to recognize the Pyralidae insects are presented in this chapter. Firstly, the robot gets images by performing a fixed action and detects whether there are Pyralidae insects in the images. The recognition method obtains the total probability image by using reverse mapping of histogram and multi-template images, and then image contour can be extracted quickly and accurately by using constraint Otsu. Finally, according to the Hu moment characters, perimeter, and area characters, the contours can be filtrated, and recognition results with triangle mark can be obtained. According to the recognition results, the speed of the robot car and mechanical arm can be adjusted adaptively. The theoretical analysis and experimental results show that the proposed scheme has high timeliness and high recognition accuracy in the natural planting scene

    Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

    Full text link
    Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio

    Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

    Full text link
    ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.Comment: Published at NeurIPS 202

    p38MAPK plays a pivotal role in the development of acute respiratory distress syndrome

    Get PDF
    Acute respiratory distress syndrome (ARDS) is a life-threatening illness characterized by a complex pathophysiology, involving not only the respiratory system but also nonpulmonary distal organs. Although advances in the management of ARDS have led to a distinct improvement in ARDS-related mortality, ARDS is still a lifethreatening respiratory condition with long-term consequences. A better understanding of the pathophysiology of this condition will allow us to create a personalized treatment strategy for improving clinical outcomes. In this article, we present a general overview p38 mitogen-activated protein kinase (p38MAPK) and recent advances in understanding its functions. We consider the potential of the pharmacological targeting of p38MAPK pathways to treat ARDS
    • …
    corecore